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Abstract. Linear system solving is one of the main workhorses in applied mathematics. Re¬ 
cently, theoretical computer scientists have contributed sophisticated algorithms for solving 
linear systems with symmetric diagonally dominant matrices (a class to which Laplacian ma¬ 
trices belong) in provably nearly-linear time. While these algorithms are highly interesting 
from a theoretical perspective, there are no published results how they perform in practice. 
With this paper we address this gap. We provide the first implementation of the combinatorial 
solver by [Kelner et ah, STOC 2013], which is particularly appealing for implementation due 
to its conceptual simplicity. The algorithm exploits that a Laplacian matrix corresponds to a 
graph; solving Laplacian linear systems amounts to finding an electrical flow in this graph 
with the help of cycles induced by a spanning tree with the low-stretch property. 

The results of our comprehensive experimental study are ambivalent. They confirm a nearly- 
linear running time, but for reasonable inputs the constant factors make the solver much 
slower than methods with higher asymptotic complexity. One other aspect predicted by the¬ 
ory is confirmed by our findings, though: Spanning trees with lower stretch indeed reduce 
the solver’s running time. Yet, simple spanning tree algorithms perform in practice better 
than those with a guaranteed low stretch. 


1 Introduction 


Solving square linear systems Ax = b, where A and x,b S", has been one of the most 

important problems in applied mathematics with wide applications in science and engineering. In 
practice system matrices are often sparse, i.e. they contain o(n^) nonzeros. Direct solvers with 
cubic running times do not exploit sparsity. Ideally, the required time for solving sparse systems 
would grow linearly with the number of nonzeros 2m. Moreover, approximate solutions usually 
suffice due to the imprecision of floating point arithmetic. Spielman and Teng ll24l . following an 
approach proposed by Vaidya na, achieved a major breakthrough in this direction by devising 
a nearly-linear time algorithm for solving linear systems in symmetric diagonally dominant ma¬ 
trices. Nearly-linear means 0{m ■ polylog(n) • log(l/e)) here, where polylog(n) is the set of 
real polynomials in log(n) and e is the relative error \\x — a:opt|U/||2;opt|| A we want for the solu¬ 
tion X G". Here || • ||a is the norm ||a;||A := \/x"^Ax given by A, and Xopt := A'^b is an exact 
solution. A matrix A = G is diagonally dominant if |a«l > Ej/i l«iil for all 

i G [n]. Symmetric matrices that are diagonally dominant (SDD matrices) have many applica¬ 
tions: In elliptic PDEs Q, maximum flows l8l, and sparsifying graphs ll2^ . Thus, the problem 
INV-SDD of solving linear systems Ax = b for x on SDD matrices A is of significant importance. 
We focus here on Laplacian matrices (which are SDD) due to their rich applications in graph 
algorithms, e. g. load balancing cni, but this is no limitation m. 



Related work. Spielman and Teng’s seminal paper Il24l requires a lot of sophisticated machinery; 
a multilevel approach II27I22II using recursive preconditioning, preconditioners based on low- 
stretch spanning trees ll25l and spectral graph sparsifiers 0231161 . Later papers extended this ap¬ 
proach, both by making it simpler and by reducing the exponents of the polylogarithmic time 
factors|^We focus on a simplihed algorithm by Kelner et al. IH that reinterprets the problem of 
solving an SDD linear system as finding an electrical flow in a graph. It only needs low-stretch 
spanning trees and achieves O(mlog^nloglognlog(l/e)) time. 

Another interesting nearly-linear time SDD solver is the recursive sparsiflcation approach by 
Peng and Spielman ilD. Together with a parallel sparsiflcation algorithm, such as the one given 
by Koutis 021 , it yields a nearly-linear work parallel algorithm. 

Spielman and Teng’s algorithm crucially uses the low-stretch spanning trees first introduced 
by Alon et al. a . Elkin et al. IfTTl provide an algorithm for computing spanning trees with polyno¬ 
mial stretch in nearly-linear time. Specifically, they get a spanning tree with 0{m log^n log log n) 
stretch in 0{m log^n) time. Abraham et al. 11121 later showed how to get rid of some of the log¬ 
arithmic factors in both stretch and time. 

Motivation, Outline and Contribution. Although several extensions and simplifications to Spiel¬ 
man and Teng’s nearly-linear time solver ll24l have been proposed, none of them has been val¬ 
idated in practice so far. We seek to All this gap by implementing and evaluating an algorithm 
proposed by Kelner et al. Ill that is easier to describe and implement than Spielman and Teng’s 
original algorithm. Thus, in this paper we implement the KOSZ solver (the acronym follows from 
the authors’ last names) by Kelner et al. HI and investigate its practical performance. To this 
end, we start in Section]^ by settling notation and outlining KOSZ. In Section]^ we elaborate 
on the design choices one can make when implementing KOSZ. In particular, we explain when 
these choices result in a provably nearly-linear time algorithm. Section|^contains the heart of this 
paper, the experimental evaluation of the Laplacian solver KOSZ. We consider the configuration 
options of the algorithm, its asymptotics, its convergence and its use as a smoother. Our results 
confirm a nearly-linear running time, but at the price of very high constant factors, in part due to 
memory accesses. We conclude the paper in Section |^by summarizing the experimental results 
and discussing future research directions. 

2 Preliminaries 

Fundamentals. We consider undirected simple graphs G = (V, E) with n vertices and m edges. 
A graph is weighted if we have an additional function w: E —>^>o. Where necessary we consider 
unweighted graphs to be weighted with We = 1 Ve G E. We usually write an edge {m, v} C E 
as uv and its weight as Wuv Moreover, we define the set operations U, n and \ on graphs by 
applying them to the set of vertices and the set of edges separately. For every node m G E its 
neighbourhood Nq{u) is the set Nq{u) :={uGE : uv G E} of vertices v with an edge to u 
and its degree is c?„ = J^veNdu) '^nv The Laplacian matrix of a graph G = (V, E) is defined 
as := —Wyy if uv G E, J2xeNG{u) if u = v and 0 otherwise for u,v gV. A Laplacian 
matrix is always an SDD matrix. Another useful property of the Laplacian is the factorization 
L = where B is the incidence matrix and R is the resistance matrix 

defined by Bab,c = lifa = c, = —lif 6 = c and 0 otherwise. i?ei,e 2 = l/tt'ei if = 62 and 0 
otherwise. This holds for all ei, 62 G E and a,b,c G V, where we arbitrarily fix a start and end 

' Spielman provides a comprehensive overview of later work at http : / /www. cs . yale . edu/homes/ 
spielman/precon/precon. html (accessed on February 10, 2015). 
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node for each edge when defining B. With x'^Lx = {Bx)^R ^{Bx) = > 0 

(every summand is non-negative), one can see that L is positive semidefinite. (A matrix A 
is positive semidefinite if x"^Ax > 0 for all x €”.) 


Cycles, Spanning Trees and Stretch. A cycle in a graph is usually defined as a simple path that 
returns to its starting point and a graph is called Eulerian if there is a cycle that visits every 
edge exactly once. In this work we will interpret cycles somewhat differently: We say that a 
cycle in G is a subgraph G of G such that every vertex in G is incident to an even number of 
edges in G, i. e. a cycle is a union of Eulerian graphs. It is useful to define the addition Gi © G2 
of two cycles Gi,G2 to be the set of edges that occur in exactly one of the two cycles, i.e. 
Gi © G2 := (Gi \ G2) U (G2 \ Cl). In algebraic terms we can regard a cycle as a vector G C Ff" 
such that J2veNc{u) 1 = 0 in F2 for all rt G 1/ and the cycle addition as the usual addition on Ff. 
We call the resulting linear space of cycles C(G). 

In a spanning tree (ST) T = {V, Et) of G there is a unique path Pt{u, v) from every node u 
to every node v. For any edge e = uv G E\Et (an ojf-tree-edge with respect to T), the subgraph 
eU Pt {u, v) is a cycle, the basis cycle induced by e. One can easily show that the basis cycles 
form a basis of C(G). Thus, the basis cycles are very useful in algorithms that need to consider 
all the cycles of a graph. Another notion we need is a measure of how well a spanning tree 
approximates the original graph. We capture this by the stretch st(e) = {J^e'ePTiu v) ) /we of 
an edge e = uv G E. This stretch is the detour you need in order to get from one endpoint of the 
edge to the other if you stay in T, compared to the length of the original edge. In the literature the 
stretch is sometimes defined slightly differently, but we follow the definition in lfT4l using Wg. The 
stretch of the whole tree T is the sum of the individual stretches st(r) = st(e). Finding a 

spanning tree with low stretch is crucial for proving the fast convergence of the KOSZ solver. 


KOSZ (Simple) Solver. As illustrated in Figurej^in the appendix, we can regard G as an electrical 
network where each edge uv corresponds to a resistor with conductance Wuv and x as an assign¬ 
ment of potentials to the nodes of G. Then Xy — Xu is the voltage across uv and (xy — Xu) ■ Wuv 
is the resulting current along uv. Thus, {Lx)u is the current flowing out of u that we want to 
be equal to the right-hand side These interpretations used by the KOSZ solver are sum¬ 
marized in Table in the appendix. Furthermore, one can reduce solving SDD systems to the 
related problem INV-LAPLACIAN-CURRENT lfT4l : Given a Faplacian L — L{G) and a vector 
b G im(L), compute a function f ■. E ^ with (i) / being a valid graph flow on G with demand 
b and (ii) the potential drop along every cycle in G being zero, where a valid graph flow means 
that the sum of the incoming and outgoing flow at each vertex respects the demand in x and that 
f{u,v) = —f{v, u) Vuv G E. Also, is a bidirected copy of E and the potential drop of cycle G 
is X^eGC /(6)^e' The idea of the algorithm is to start with any valid flow and successively adjust 
the flow such that every cycle has potential zero. We need to transform the flow back to potentials 
at the end, but his can be done consistently, as all potential drops along cycles are zero. 

Regarding the crucial question of what flow to start with and how to choose the cycle to be 
repaired in each iteration, Kelner et al. Cl suggest using the cycle basis induced by a span¬ 
ning tree T of G and prove that the convergence of the resulting solver depends on the stretch 
of T. More specifically, they suggest starting with a flow that is nonzero only on T and weight¬ 
ing the basis cycles by their stretch when sampling them. The resulting algorithm is shown as 
Algorithm [T] note that we may stop before all potential drops are zero and we can consistently 
compute the potentials induced by / at the end by only looking at T. 
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Algorithm 1: inv-laplacian-current solver KOSZ. 
Input: Laplacian L = L{G) and vector b e im(L). 

Output: Solution x\o Lx = b. 

1 T <- a spanning tree of G 

2 / ^ unique flow with demand b that Is only nonzero on T 

3 while there is a cycle with potential drop / 0 /n / do 

4 c -s- cycle in C{T) chosen randomly weighted by stretch 
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« return vector of potentials in f with respect to the root ofT 


The solver described in Algorithm[2is actually just the SimpleSolver in Kelner et al.’s IH 
paper. They also show how to improve this solver by adapting preconditioning to the setting of 
electrical flows. In informal experiments we could not determine a strategy that is consistently 
better than the SimpleSolver, so we do not pursue this scheme any further here. Eventually, 
Kelner et al. iflTll derive the following running time for KOSZ: 

Theorem 1. ^741 Thin. 3.2] SimpleSolver can be implemented to run in time 
0(rn log^ n log log n log(e“^n)) while computing an e-approximation of x. 

3 Implementation 

While Algorithm[2provides the basic idea of the KOSZ solver, it leaves open several implemen¬ 
tation decisions that we elaborate on in this section. 

Spanning trees. As suggested by the convergence result in Theorem[2 the KOSZ solver depends 
on low-stretch spanning trees. Elkin et al. im presented an algorithm requiring nearly-linear 
time and yielding nearly-linear average stretch. The basic idea is to recursively form a spanning 
tree using a star of balls in each recursion step. We note that we use Dijkstra with binary heaps 
for growing the balls and that we take care not to need more work than necessary to grow the 
ball. In particular, ball growing is output-sensitive and growing a ball B{x,r) •.= {v & V : 
Distance from x to u is < r} should require O{d\ogn) time where d is the sum of the degrees 
of the nodes in B{x,r). The exponents of the logarithmic factors of the stretch of this algo¬ 
rithm were improved by subsequent papers (see Table in the appendix), but Papp Il20l showed 
experimentally that these improvements do not yield better stretch in practice. In fact, his exper¬ 
iments suggest that the stretch of the provable algorithms is usually not better than just taking a 
minimum-weight spanning tree. Therefore, we additionally use two simpler spanning trees with¬ 
out stretch guarantees: A minimum-distance spanning tree with Dijkstra’s algorithm and binary 
heaps; as well as a minimum-weight spanning with Kruskal’s algorithm using union-find with 
union-by-size and path compression. 

To test how dependent the algorithm is on the stretch of the ST, we also look at a special ST 
for rii X 712 grids. As depicted in Eigure[T] we construct this spanning tree by subdividing the 
ni X 172 grid into four subgrids as evenly as possible, recursively building the STs in the subgrids 
and connecting the subgrids by a U-shape in the middle. 


Proposition!. The special ST has O [ 


nin2 


) average stretch on an ni x n 2 grid. 
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(a) Recursive construction (b) ST for ni = n 2 = 4 

Fig. 1: Special spanning tree with 0 ^ ^ average stretch for the m x n 2 grid. 

Flows on trees. Since every basis cycle contains exactly one off-tree-edge, the flows on off-tree- 
edges can simply be stored in a single vector. To be able to efficiently get the potential drop of 
every basis cycle and to be able to add a constant amount of flow to it, the core problem is to 
efficiently store and update flows in T. More formally, we want to support the following two 
operations for all u,v G V and ct S on the flow /: 

- query(u, u): return the potential drop J2eePTiu,v) /(e)’'e 

- update(u, v, a)\ set /(e) := /(e) -f a for all e S Pt{u, v) 

We can simplify the operations by fixing v to be the root r of T: query(M): return the poten¬ 
tial drop J2eePT{u r) /(6)*’e and update(u, a): set /(e) := /(e) + a for all e G Pt{u, r). The 
itemized two-node operations can then be supported with query(it, v) := query(it) — query(u) 
and update(ti, u, a) := {update(u, a) andupdate(u, —a)} since the changes on the subpath 
P7’(r, LCA(tt, u)) cancel out. Here LCA(u, u) is the lowest common ancestor of the nodes u 
and V in T, the node farthest from r that is an ancestor of both u and v. We provide two ap¬ 
proaches for implementing the operations, first an implementation of the one-node operations 
that stores the flow directly on the tree and uses the definitions of the operations without modifi¬ 
cation. Obviously, these operations require 0{n) worst-case time and 0{n) space. With an LCA 
data structure, one can implement the itemized two-node operations without the subsequent sim¬ 
plification of using one-node operations. This does not improve the worst-case time, but can help 
in practice. Secondly, we use the improved data structure by Kelner et al. iflTll that guarantees 
O(logn) worst-case time but uses O(nlogn) space. In this case the one-node operations boil 
down to a dot product (query) and an addition (update) of a dense vector and a sparse vector. 
We unroll the recursion within the data structure for better performance in practice. 

Cycle selection. The easiest way to select a cycle is to choose an off-tree edge uniformly at 
random in Oil) time. However, to get provably good results, we need to weight the off-tree- 
edges by their stretch. We can use the flow data structure described above to get the stretches. 
More specifically, the data structure initially represents / = 0. For every off-tree edge uv we first 
execute update(u, v, 1), then query(M, v) to get v) Anally update(w, v, —1) to 

return to / = 0. This results in 0(m log n) time to initialize cycle selection. Once we have the 
weights, we use roulette wheel selection in order to select a cycle in 0{logm) time after an 
additional 0{m) time initialization. 
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For convenience we summarize the implementation choices for Algorithm [T] in Table [^(ap¬ 
pendix). The top-level item in each section is the running time of the best sub-item that can be 
used to get a provably good running time. The convergence theorem requires a low-stretch span¬ 
ning tree and weighted cycle selection. Note that m = I7(n) as G is connected. 

4 Evaluation 

4.1 Settings 

We implemented the KOSZ solver in C-n- using NetworKit ll2^ . a toolkit focused on scalable 
network analysis algorithms. As compiler we use g++ 4.8.3. The benchmark platform is a dual¬ 
socket server with two 8-core Intel Xeon E5-2680 at 2.7 GHz each and 256 GB RAM. Only a 
representative subset of our experiments are shown here. More experiments and their detailed dis¬ 
cussion can be found in 1(131 . We compare our KOSZ implementation to existing linear solvers as 
implemented by the libraries Eigen 3.2.2 II2 and Paralution 0.7.0 ESI. CPU performance char¬ 
acteristics such as the number of executed PLOPS (floating point operations), etc. are measured 
with the PAPI library HT). 

We mainly use two graph classes for our tests: (i) Rectangular k x I grids given by Gk,i ■= 
([A:] X [I],{{{xi,yi),{x2,y2)} C (^) : \xi - X2\ = 1 V |t/i - 2/2I = l}). Laplacian systems 
on grids are, for example, crucial for solving boundary value problems on rectangular domains; 
(ii) Barabasi-Albert 11 random graphs with parameter k. These random graphs are parametrized 
with a so-called attachment k. Their construction models that the degree distribution in many 
natural graphs is not uniform at all. Por both classes of graphs, we consider both unweighted and 
weighted variants (uniform random weights in [1, 8)). We also did informal tests on 3D grids and 
graphs that were not generated synthetically. These graphs did not exhibit significantly different 
behavior than the two graph classes above. 

4.2 Results 

Spanning tree. Papp Il20ll tested various low-stretch spanning tree algorithms and found that in 
practice the provably good low-stretch algorithms do not yield better stretch than simply using 
Kruskal. We confirm and extend this observation by comparing our own implementation of Elkin 
et al.’s El low-stretch ST algorithm to Kruskal and Dijkstra in Pigure [^ Except for the un¬ 
weighted 100 X 100 grid, Elkin has worse stretch than the other algorithms and Kruskal yields a 
good ST. Por Barabasi-Albeit graphs, Elkin is extremely bad (almost factor 20 worse). Interest¬ 
ingly, Kruskal outperforms the other algorithms even on the unweighted Barabasi-Albert graphs, 
where it degenerates to choosing an arbitrary ST. Pigure]^ also shows that our special ST yields 
significantly lower stretch for the unweighted 2D grid, but it does not help in the weighted case. 

Convergence. In Pigure]^ we plot the convergence of the residual for different graphs and differ¬ 
ent algorithm settings. We examined a 100 x 100 grid and a Barabasi-Albert graph with 25,000 
nodes. While the residuals can increase, they follow a global downward trend. Also note that the 
spikes of the residuals are smaller if the convergence is better. In all cases the solver converges 
exponentially, but the convergence speed crucially depends on the solver settings. If we select cy¬ 
cles by their stretch, the order of the convergence speeds is the same as the order of the stretches 
of the ST (cmp. Pigure |^, except for the Dijkstra ST and the Kruskal ST on the weighted grid. 
In particular, for the Elkin ST on Barabasi-Albert graphs, there is a significant gap to the other 
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100 X 100 grid, unweighted 


100 X 100 grid, weighted 


Barabasi(25000,4), unweighted 


Barabasi(25000,4), weighted 
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Fig. 2: Average stretch st{T)/m with different ST algorithms. 


settings where the solver barely converges at all and the special ST wins. Thus, low-stretch STs 
are crucial for convergence. In informal experiments we also saw this behavior for 3D grids and 
non-synthetic graphs. 

We could not detect any correlation between the improvement made by a cycle repair and the 
stretch of the cycle. Therefore, we cannot fully explain the different speeds with uniform cycle 
selection and stretch cycle selection. For the grid the stretch cycle selection wins, while Barabasi- 
Albert graphs favor uniform cycle selection. Another interesting observation is that most of the 
convergence speeds stay constant after an initial fast improvement at the start to about residual 1. 
That is, there is no significant change of behavior or periodicity. Even though we can hugely 
improve convergence by choosing the right settings, even the best convergence is still very slow, 
e.g. we need about 6 million iterations (« 3000 sparse matrix-vector multiplications (SpMVs) in 
time comparison) on a Barabasi-Albert graph with 25,000 nodes and 100,000 edges in order to 
reach residual 10““^. In contrast, conjugate gradient (CG) without preconditioning only needs 204 
SpMVs for this graph. 

Asymptotics. Now that we know which settings of the algorithm yield the best performance for 
2D grids and Barabasi-Albert graphs, we proceed by looking at how the performance with these 
settings behaves asymptotically and how it compares to conjugate gradient (CG) without precon¬ 
ditioning, a simple and popular iterative solver. Since KOSZ turns out to be not competitive, we 
do not need to compare it to more sophisticated algorithms. 

In Figure l^each occurrence of c stands for a new instance of a real constant. We expect the 
cost of the CG method to scale with 0{n^'^) on 2D grids 0, while our algorithm should scale 
nearly-linearly. This expectation is confirmed in the plot: Using Levenberg-Marquardt lfT9l to 
approximate the curves for CG with a function of the form ax^ + c, we get 6 « 1.5 for FLOPS 
and memory accesses, while the (more technical) wall time and cycle count yield a slightly higher 
exponent 6 « 1.6. We also see that the curves for our algorithm are almost linear from about 
650 X 650. Unfortunately, the hidden constant factor is so large that our algorithm cannot compete 
with CG even for a 1000 x 1000 grid. Note that the difference between the algorithms in FLOPS 
is significantly smaller than the difference in memory accesses and that the difference in running 
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(a) 100 X 100 grid, unweighted (b) 100 x 100 grid, weighted 




(c) Barabasi-Albert, n = 25000, unweighted (d) Barabasi-Albert, n = 25000, weighted 
Fig. 3: Convergence of the residual. Terminate when residual < 10“^. 


time is larger still. This suggests that the practical performance of our algorithm is particularly 
bounded by memory access patterns and not by floating point operations. This is noteworthy when 
we look at our special spanning tree for the 2D grid. We see that using the special ST always 
results in performance that is better by a constant factor. In particular, we save a lot of FLOPS 
(factor 10), while the savings in memory accesses (factor 2) are a lot smaller. Even though the 
FLOPS when using the special ST are within a factor of 2 of CG, we still have a wide chasm in 
running time. 

The results for the Barabasi-Albert graphs are basically the same (and hence not shown in 
detail); Even though the growth is approximately linear from about 400,000 nodes, there is still a 
large gap between our algorithm and CG since the constant factor is enormous. Also, the results 
for the number of ELOPS are again much better than the result for the other performance counters. 
In conclusion, although we have nearly-linear growth, even for 1,000,000 graph nodes, the KOSZ 
algorithm is still not competitive with CG because of huge constant factors, in particular a large 
number of iterations and memory accesses. 

Smoothing. One way of combining the good qualities of two different solvers is smoothing. 
Smoothing means to dampen the high-frequency components of the error, which is usually done 
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(c) FLOPS 


(d) Memory accesses 


Fig. 4: Asymptotic behaviour for 2D grids. Termination when relative residual was < 10 The 
error bars give the standard deviation. 


in combination with another solver that dampens the low-frequency error components. It is known 
that in CG and most other solvers, the low-frequency components of the error converge very fast, 
while the high-frequency components converge slowly. Thus, we are interested in finding an al¬ 
gorithm that dampens the high-frequency components, a good smoother. This smoother does not 
necessarily need to reduce the error, it just needs to make its frequency distribution more favor¬ 
able. Smoothers are particularly often applied at each level of multigrid or multilevel schemes lb] 
that turn a good smoother into a good solver by applying it at different levels of a matrix hierar¬ 
chy. To test whether the Laplacian solver is a good smoother, we start with a fixed x with Lx = h 
and add white uniform noise in [—1,1] to each of its entries in order to get an initial vector xq. 
Then we execute a few iterations of our Laplacian solver and check whether the high-frequency 
components of the error have been reduced. Unfortunately, we cannot directly start at the vec¬ 
tor xq in the solver. Our solution is to use Richardson iteration. That is, we transform the residual 
r = b — Lxq back to the source space by computing L~^r with the Laplacian solver, get the error 
e = X — xq = L~^r and then the output solution xi = xq + L~^r. 
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Fig. 5: The Laplacian solver with the special ST as a smoother on a 32 x 32 grid. For each number 
of iterations of the solver we plot the current error and the absolute values of its transformation 
into the frequency domain. Note that (a) and (k) have a different scale. 
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(i) 1000 iterations, error (j) 1000 iterations, frequency (k) 10000 iterations, error (1) 10000 iterations, frequency 








































Figure shows the error vectors of the solver for a 32 x 32 grid together with their transfor¬ 
mations into the frequency domain for different numbers of iterations of our solver. We see that 
the solver may indeed be useful as a smoother since the energies for the large frequencies (on the 
periphery) decrease rapidly, while small frequencies (in the middle) in the error remain. 

In the solver we start with a flow that is nonzero only on the ST. Therefore, the flow values on 
the ST are generally larger at the start than in later iterations, where the flow will be distributed 
among the other edges. Since we construct the output vector by taking potentials on the tree, after 
one iteration xi will, thus, have large entries compared to the entries of b. In subplot (c) of Figure]^ 
we see that the start vector of the solver has the same structure as the special ST and that its error 
is very large. For the 32 x 32 grid we, therefore, need about 10000 iterations (« 150 SpMVs in 
running time comparison) to get an error of a:i similar to xq even though the frequency distribution 
is favorable. Note that the number of SpMVs the 10000 iterations correspond to depends on the 
graph size, e.g. for an 100 x 100 grid the 10000 iterations correspond to 20 SpMVs. 

While testing the Laplacian solver in a multigrid scheme could be worthwhile, the bad initial 
vector creates robustness problems when applying the Richardson iteration multiple times with a 
fixed number of iterations of our solver. In informal tests multiple Richardson steps lead to ever 
increasing errors without improved frequency behavior unless our solver already yields an almost 
perfect vector in a single run. 

5 Conclusions 

At the time of writing, the presented KOSZ implementation and evaluation provide the first 
comprehensive experimental study of a Laplacian solver with provably nearly-linear running time. 
Our study supports the theoretical result that the convergence of KOSZ crucially depends on the 
stretch of the chosen spanning tree, with low stretch generally resulting in faster convergence. 
This particularly suggests that it is crucial to build algorithms that yield spanning trees with lower 
stretch. Since we have conflrmd and extended Papp’s EOll observation that algorithms with prov¬ 
ably low stretch do not yield good stretch in practice, improving the low-stretch ST algorithms is 
an important future research direction. Even though KOSZ proves to grow nearly linearly as pre¬ 
dicted by theory, the constant seems to be too large to make it competitive, even compared to the 
CG method without preconditioner. Hence, our initial question in the paper title can be answered 
with “yes” and “no” at the same time: The ranning time is nearly linear, but the constant factors 
prevent usefulness in practice. While the negative results may predominate, our effort is the first 
to provide an answer at all. We hope to deliver insights that lead to further improvements, both in 
theory and practice. A promising future research direction is to repair cycles other than just the 
basis cycles in each iteration, but this would necessitate significantly different data structures. 
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Appendix 


A KOSZ Solver Background 

A.l Correspondence between Graphs and Laplacian Matrices 


Table 1: Interpretations given to a Laplacian L = L{G) and a vector x €" where the We. 

for each e € E aie the edge weights. 

e edge/resistor e 

We conductance of resistor e 

re := 1/we resistance of resistor e 

Xu potential at node u 

{Lx)u current flowing out of node u 

bu current required to flow out of node u 


L operates on every vector x via 

{Lx)u = -Xu ■ ^ Wuv + '^ Xu- Wuv 

v^N{u} v^N(u) 

- ^ ^ i^Xy Xu') ' Wuv 

v^N{u) 

for each u G V. 


2 2V 




Fig. 6; Transformation into an electrical network. 
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A.2 Algorithm Components 


Table 2: Summary of the components of the algorithm 


Spanning tree 

Dijkstra 

Kruskal 

Elkin et. al. 1111 
Abraham et. al. (2] 

0 (m log n log log n )) stretch, O (m log n log log n) time 
no stretch bound, 0{m log n) time 
no stretch bound, 0{m log n) time 
(!?(m log^n log log n) stretch, 0(m\og^n) time 

0{m log n log log n) stretch, 0{m log n log log n) time 

Initialize cycle selection 0{m log n) time 

Uniform 

0(m) time 

Weighted 

0{m log n) time 

Initialize flow 

0(nlog7t) time 

LCA flow 

0{n) time 

Log flow 

0{n log n) time 

Iterations 

0(mlognloglognlog(e“^ logn)) expected iterations 

Select a cycle 

O(logn) time 

Uniform 

0(1) time 

Weighted 

(!?(log n) time 

Repair cycle 

0(\ogn) time 

LCA flow 

0{n) time 

Log flow 

C)(log n) time 


Complete solver C)(mlog^riloglognlog(e ^ logn)) expected time 

Improved solver (9(mlog^nloglog?ilog(e“^)) expected time 


B Spanning Tree Results 

B.l Proofof Proposition 

We can inductively show that the average stretch S{ni , 712 ) of the special ST on the ni x 712 grid 
is in 0((ni + 772 )^ log(77i +7i2)/77in2)- To do so, we hrst prove that by the recursive construction 
the distance of a node on a border of the grid to a corner of the same border is in 0{ni + 712 ). 
Thus, the stretches of the 711 + 712 — 3 off-tree edges between the rows [ 712 / 2 J and [ 772 / 2 ] -f 1 as 
well as the columns [ 771 / 2 J and [ 771 / 2 J + 1 are in 0(77i + 772 ) each. Consequently, 

S'(77i, 772 ) = 4 • 5 ( 771 / 2 , 772 / 2 ) - 1 - 0 ( 771 - 1 - 772 )^ 

when disregarding rounding. After solving this recurrence (note that S{ni/2, n2l2) is essentially 
one fourth in size compared to 5 ( 771 , 772)), we get 

5(771,772) = 0 (( 77 l -f 772)^ l 0 g( 77 l -f 772)). 

Since the number of edges of the grid is 0 ( 77777 ), the claim for the average stretch follows. Note 
that in case of a square grid (771 = 772 ) with iV = 771 x 772 vertices, we get 

S{N) = 45(V/4) +0{N)=0{N log N) = 0{n\ log(77i)) 

and thus O (log 771 ) average stretch. □ 


14 








B.2 Overview of spanning tree algorithms and their stretch 


Table 3: Spanning tree algorithms and their guaranteed stretch 

Time Stretch 

f3l 0(rn?) m- exp[0{-\/logn\og\ogn)) 

111! 0(rnlog^ri) m • C>(log^nloglogn) 

m ©(mlog^n) m • C>(logn(loglogn)^) 

(m 0(rn log n log log n) m ■ O (log n (log log n) ^) 

(2) 0[m log n log log n) m ■ O (log n log log n) 

Dijkstra Cl((m + n) log n) No guarantee 

Kruskal 0{ma{n) log nj No guarantee 
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